TNO – MaffiaFinder

VAST 2008 Challenge
Mini Challenge 3:  Cell Phone Calls 

Authors and Affiliations:

Erik Boertjes, TNO Information and Communication technology, Delft, The Netherlands,
before July 26th : erikboertjes@hotmail.com ,after July 26th:  erik.boertjes@tno.nl  [PRIMARY contact]


Student team: NO

Tool(s):

The tool is developed and implemented in June 2008 by TNO specifically for this challenge. We believe that its use, however, is not limited to this challenge. It allows to view and interact with the data from three different viewpoints:

1)       Sequence diagram: plots the phone calls analogue to a UML sequence diagram. Each cell phone is represented by a vertical line. Each call from one cell phone to another is represented by a horizontal line between the corresponding vertical lines. Time is from top to bottom: this allows visualizing the order in time of the calls. A clustering algorithm (Cluto) puts lines with many calls between them close to each other. This improves readability of the diagram.

2)       Graph: each node is a cell phone, each edge represent the aggregation of calls between two phones. Aggregation can be performed per day, or over the complete 10 day period.

3)       Color bar fingerprints: each cell phone is represented by a colored bar that visualizes the locations of the cell towers from where the calls where made with the phone

Screenshot 1: Sequence diagram view (click here for high-res version)

 

Screenshot 2: Graph view (click here for high-res version)

 

Screenshot 3: Color bar view (click here for high-res version)

 

Two Page Summary:   NO (but I would love to make one in case I am awarded a certificate :-) )

 

 

ANSWERS:


Phone-1: What is the Catalano/Vidro social network, as reflected in the cell phone call data, at the end of the time period  

   PhoneNodes.txt

   PhoneLinks.txt


Phone-2  Characterize the changes in the Catalano/Vidro social structure over the ten day period.

Detailed Answer:

The first given clue is that Fernando is ID 200. We start by locating this ID in the graph view, which shows the number of calls between each pair of phones (nodes) summated over the 10 day period. Thickness of edges represent number of calls. A force-directed algorithm causes nodes that call each other frequently to cluster. The tool allows for searching a specific ID. Figure 1 shows that Fernando called with IDs 1,2,3,5,97, and 137. Hovering over the links shows the amount of calls to and from each node. Fernando has called ID 5 most frequently, so we assume that ID 5 is his brother Estaban.

 

Figure 1 – phone connections with ID 200 (click here for high-res version)

 

The tool allows (by ticking the ‘+1’ box) for displaying connections that are ‘1 step away’. Figure 2 shows that from Fernando, almost half of all the phones on the island can be reached through the 6 people that he calls. Especially nodes 1,3,5 account for this: they have a large number of different connections as compared to other nodes.

 

Figure 2 – Fernando’s social contacts reach many people (click here for high-res version)

 

With the 7 IDs at hand selected, we switch to the sequence diagram view. This view shows the 7 cell phones as vertical lines, and the calls between them as horizontal arrows (Figure 3). Time is displayed from top to bottom.

 

Figure 3 – sequence diagram of 7 phones and their calls (click here for high-res version)

 

Hovering over a vertical line highlights the calls from and to the corresponding cell phone, and shows their number in a window. Between these 7, ID 5 receives the most calls. Could he be David Vidro, the one coordinating Paraiso activities? (We think not, see below). Zooming in (Figure 4) reveals an interesting pattern: ID 1 calls IDs 2, 3 and 5. Besides some calls with negative duration (which we removed from the database), we found many of these overlapping calls in the dataset. We even found 2 persons (IDs 138 and 321) having two calls to each other at the same time (on 2006-06-08 at 12:41).

 

Figure 4 – sequence diagram – zoomed in (click here for high-res version)

 

Next, we had a look at the location information given in the dataset. We used color to indicate location on the island, varying from pink in the north west via purple to blue to green in the south east. Locations that are close together have colors that are close together, i.e. resemble each other. Each phone is given a fingerprint, displaying the location of the calls made with that phone. Figure 5 shows these fingerprints for all 400 phones, ordered by phone number. These color bars make it easy to identify the general whereabouts of persons, and make it easy to see which persons are living or working close together (they would have resembling bars). The current version shows all bars of equal length, representing 100% of the calls made by that phone. By hovering over a bar, a list of number of calls and their cellTower is given for the phone at hand.

 

Figure 5 – color bar view, red arrow, not in original screenshot, shows bar for ID 97 (click here for high-res version)

 

From this view we learn that:

- Most people make calls from about 1 or 2 different areas.

- Few people, like ID 309, make calls from 5 or more different areas (very colorful bar).

- Many people, like ID 11, have bars that have a substantial pink and a substantial light green part, which may indicate that they travel back and forth from one side of the island to the other (e.g. commuting).

 

When focusing on our Catalano/Vidro friends we see that:

ID 1 is making calls from cell towers 29 (from a boat) and 11 (city)

ID 2 is making calls in the same area as ID 1, mainly from a boat

ID 3 is making calls from 10 (hills? a lookout point?) and 30 (a boat)

ID 5 seems to be calling mainly from the water, both at the north west and the mid-east coast.

ID 97 makes all calls from one location, the city most south of the island (cell tower 22)

ID 137 is traveling between north west and mid-east.

 

From this we conclude that ID 97 is not joining the activities in the field but is rather coordinating high level activities from a location for from the action. We think that our former suspect ID 5 is coordinating activities on a mere local level and that ID 97 is David Vidro. Other indications for this are given by a sequence diagram view of ID 97 and the IDs he calls with (see Figure 6). There is a lot of activity on day 6 between 15:20 and 18:20. ID 97 receives many phone calls from both places at sea, 30, 22 and 29. The same happens the next day, day 7, between 12:54 and 20:12. David stays in cell tower 22 all the time. This raises the suspicion that he might be coordinating, and is thus David Vidro. During the 2 actions, IDs 5,20,61,200,211,218,306,313 are participating. From the towers from which they make the calls, can be concluded that ID 5 is the local man in the west. He is participating in activities from day 0 to 5, but after that, he does not seem to be involved any more.

 

Figure 6 – calls from and to David (ID 97), labels not part of original screenshot (click here for high-res version)

 

Then we had a look at the changing social structure of the Catalano/Vidro families. The graph of 2006-06-01 (= day 0) (Figure 7) shows that at the start of the 10 day period, showing IDs 1,2,3,5 and 200 as a group.

 

 

Figure 7 – social structure at start of the 10 day period (click here for high-res version)

 

Figure 8 – sequence diagram: changes in social structure during the last 6 days (click here for high-res version)

 

Figure 8 shows the sequence diagram of the calls between the 7 people during 2006-06-05 – 2006-06-10. On 2006-06-08 (day 7) there is no contact between either of the 7 people (white area in sequence diagram). Calling behaviour before and after that date differ significantly. The sequence diagram shows that ID 1 and ID 200 have contact with each other and with IDs 2,3,5. IDs 2,3, and 5 do not have contact with each other. After 2006-06-08 there is no contact between any of the group ID1,2,3,5 and 200. ID 200 (Fernando) now has contact with ID 97 and ID 137. Maybe ID 200 made promotion to the higher echelons of the Paraiso movement, and the old ‘gang’ fell apart.

 

Figure 9 – Anomaly ID 309 (click here for high-res version)

 

We conclude with an anomaly that we found with our tool: browsing through the graph, we found (among others) ID 309 interesting because of the high amount of different connections. Figure 9 shows the corresponding sequence diagram. Only ID 309 is shown together with all phones with a connection to ID 309. The pattern shows that ID 309 mostly receives calls, rather than making calls. Almost all calls are made during the last 3 days of the 10 day period, from 6AM to midnight...